Cognate and Misspelling Features for Natural Language Identification
نویسندگان
چکیده
We apply Support Vector Machines to differentiate between 11 native languages in the 2013 Native Language Identification Shared Task. We expand a set of common language identification features to include cognate interference and spelling mistakes. Our best results are obtained with a classifier which includes both the cognate and the misspelling features, as well as word unigrams, word bigrams, character bigrams, and syntax production rules.
منابع مشابه
Automatic cognate identification with gap-weighted string subsequences
In this paper, we describe the problem of cognate identification in NLP. We introduce the idea of gap-weighted subsequences for discriminating cognates from non-cognates. We also propose a scheme to integrate phonetic features into the feature vectors for cognate identification. We show that subsequence based features perform better than state-ofthe-art classifier for the purpose of cognate ide...
متن کاملEffect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners
Cognates are the words celebrating their similarities from phonetic, orthographic, and semantic points of view across two or more languages. The aim of the present study was to investigate the effect of cognate-based instruction strategy on vocabulary learning among Iranian EFL learners. To achieve the goal of the study, 80 EFL learners (15-27 years old) took part in the study; all of them were...
متن کاملEffect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners
Cognates are the words celebrating their similarities from phonetic, orthographic, and semantic points of view across two or more languages. The aim of the present study was to investigate the effect of cognate-based instruction strategy on vocabulary learning among Iranian EFL learners. To achieve the goal of the study, 80 EFL learners (15-27 years old) took part in the study; all of them were...
متن کاملSiamese convolutional networks based on phonetic features for cognate identification
In this paper, we explore the use of convolutional networks (ConvNets) for the purpose of cognate identification. We compare our architecture with binary classifiers based on string similarity measures on different language families. Our experiments show that convolutional networks achieve competitive results across concepts and across language families at the task of cognate identification.
متن کاملOffline Language-free Writer Identification based on Speeded-up Robust Features
This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...
متن کامل